A4 Neural Network Classifier, or Fun With Handwritten Digits!¶

Requirement 1¶

For this assignment, you will be adding code to the python script file neuralnetworksA4.py that you will download from here. The file neuralnetworksA4_initial.py currently contains the implementation of the NeuralNetwork class that is a solution to A3. It also contains an incomplete implementation of the subclass NeuralNetworkClassifier that extends NeuralNetwork as discussed in class. Copy or rename this file to neuralnetworksA4.py and complete the implementation of NeuralNetworkClassifier. Your NeuralNetworkClassifier implementation should rely on inheriting functions from NeuralNetwork as much as possible. Your neuralnetworksA4.py file (notice it is plural) will now contain two classes, NeuralNetwork and NeuralNetworkClassifier. The tar file neuralnetworksA4.tar also contains optimizers.py, the version of our optimizer code that you must use in this assignment.

In NeuralNetworkClassifier you will replace the _error_f function with one called _neg_log_likelihood_f. You will also have to define a new version of the _gradient_f function for NeuralNetworkClassifier.

Here are some example tests.

In [1]:

%load_ext autoreload
%autoreload 2

In [2]:

import numpy as np
import matplotlib.pyplot as plt

Import your completed neuralnetworksA4.py code that defines NeuralNetwork and NeuralNetworkClassifier classes.

In [1]:

import neuralnetworksA4 as nn

In [46]:

X = np.array([[0, 0], [1, 0], [0, 1], [1, 1]])
T = np.array([[0], [1], [1], [0]])
X, T

Out[46]:

(array([[0, 0],
        [1, 0],
        [0, 1],
        [1, 1]]),
 array([[0],
        [1],
        [1],
        [0]]))

In [47]:

np.random.seed(111)
nnet = nn.NeuralNetworkClassifier(2, [10], 2)

In [48]:

print(nnet)

NeuralNetworkClassifier(2, [10], 2) has not been trained.

In [49]:

nnet.Ws

Out[49]:

[array([[ 0.12952296, -0.38212533, -0.07383268,  0.31091752, -0.23633798,
         -0.40511172, -0.55139454, -0.09211682, -0.30174387, -0.18745848],
        [ 0.56662595, -0.3028474 , -0.48359706,  0.19583749,  0.13999926,
         -0.26066957, -0.03900416, -0.44067096, -0.49195143,  0.46277416],
        [ 0.33943873,  0.39325596,  0.36397022,  0.56690583,  0.08922813,
          0.36230683, -0.09085429, -0.5456561 , -0.05295844, -0.45573018]]),
 array([[0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.]])]

The _error_f function is replaced with _neg_log_likelihood. If you add some print statements in _neg_log_likelihood functions, you can compare your output to the following results.

In [50]:

nnet.set_debug(True)

Debugging information will now be printed.

In [51]:

nnet.train(X, T, X, T, n_epochs=1, method='sgd', learning_rate=0.01)

In _neg_log_likelihood_f: arguments are
X (standardized):
[[-1. -1.]
 [ 1. -1.]
 [-1.  1.]
 [ 1.  1.]]
T (indicator variables):
[[1. 0.]
 [0. 1.]
 [0. 1.]
 [1. 0.]]
Result of call to self._forward is:
[array([[-1., -1.],
       [ 1., -1.],
       [-1.,  1.],
       [ 1.,  1.]]), array([[-0.65071726, -0.44024437,  0.04576217, -0.42339865, -0.43460927,
        -0.46740829, -0.39822372,  0.71346703,  0.23848393, -0.19208626],
       [ 0.34231299, -0.79254131, -0.72655903, -0.06007838, -0.18346578,
        -0.77314041, -0.46175878,  0.0128676 , -0.62959015,  0.62370478],
       [-0.09735492,  0.30405172,  0.64909581,  0.5928089 , -0.27947188,
         0.2144819 , -0.53935438, -0.19458859,  0.13639376, -0.80263068],
       [ 0.77613968, -0.28371417, -0.19108161,  0.79083647, -0.00711046,
        -0.294489  , -0.59233336, -0.79262132, -0.68931726, -0.1784822 ]]), array([[0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.]])]
Result of _softmax is:
[[0.5 0.5]
 [0.5 0.5]
 [0.5 0.5]
 [0.5 0.5]]
Result of np.log(Y + sys.float_info.epsilon) is:
[[-0.69314718 -0.69314718]
 [-0.69314718 -0.69314718]
 [-0.69314718 -0.69314718]
 [-0.69314718 -0.69314718]]
_neg_log_likelihood_f returns:
0.3465735902799724
in _backpropagate: first delta calculated is
[[-0.0625  0.0625]
 [ 0.0625 -0.0625]
 [ 0.0625 -0.0625]
 [-0.0625  0.0625]]
in _backpropagate: next delta is
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
in _backpropagate: next delta is
[[0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]]
In _neg_log_likelihood_f: arguments are
X (standardized):
[[-1. -1.]
 [ 1. -1.]
 [-1.  1.]
 [ 1.  1.]]
T (indicator variables):
[[1. 0.]
 [0. 1.]
 [0. 1.]
 [1. 0.]]
Result of call to self._forward is:
[array([[-1., -1.],
       [ 1., -1.],
       [-1.,  1.],
       [ 1.,  1.]]), array([[-0.65071726, -0.44024437,  0.04576217, -0.42339865, -0.43460927,
        -0.46740829, -0.39822372,  0.71346703,  0.23848393, -0.19208626],
       [ 0.34231299, -0.79254131, -0.72655903, -0.06007838, -0.18346578,
        -0.77314041, -0.46175878,  0.0128676 , -0.62959015,  0.62370478],
       [-0.09735492,  0.30405172,  0.64909581,  0.5928089 , -0.27947188,
         0.2144819 , -0.53935438, -0.19458859,  0.13639376, -0.80263068],
       [ 0.77613968, -0.28371417, -0.19108161,  0.79083647, -0.00711046,
        -0.294489  , -0.59233336, -0.79262132, -0.68931726, -0.1784822 ]]), array([[ 2.81243949e-04, -2.81243949e-04],
       [ 1.30260852e-04, -1.30260852e-04],
       [-7.34786885e-05,  7.34786885e-05],
       [-1.04105784e-04,  1.04105784e-04]])]
Result of _softmax is:
[[0.50014062 0.49985938]
 [0.50006513 0.49993487]
 [0.49996326 0.50003674]
 [0.49994795 0.50005205]]
Result of np.log(Y + sys.float_info.epsilon) is:
[[-0.69286598 -0.69342846]
 [-0.69301693 -0.69327745]
 [-0.69322066 -0.6930737 ]
 [-0.69325129 -0.69304308]]
_neg_log_likelihood_f returns:
0.34655855279873804
In _neg_log_likelihood_f: arguments are
X (standardized):
[[-1. -1.]
 [ 1. -1.]
 [-1.  1.]
 [ 1.  1.]]
T (indicator variables):
[[1. 0.]
 [0. 1.]
 [0. 1.]
 [1. 0.]]
Result of call to self._forward is:
[array([[-1., -1.],
       [ 1., -1.],
       [-1.,  1.],
       [ 1.,  1.]]), array([[-0.65071726, -0.44024437,  0.04576217, -0.42339865, -0.43460927,
        -0.46740829, -0.39822372,  0.71346703,  0.23848393, -0.19208626],
       [ 0.34231299, -0.79254131, -0.72655903, -0.06007838, -0.18346578,
        -0.77314041, -0.46175878,  0.0128676 , -0.62959015,  0.62370478],
       [-0.09735492,  0.30405172,  0.64909581,  0.5928089 , -0.27947188,
         0.2144819 , -0.53935438, -0.19458859,  0.13639376, -0.80263068],
       [ 0.77613968, -0.28371417, -0.19108161,  0.79083647, -0.00711046,
        -0.294489  , -0.59233336, -0.79262132, -0.68931726, -0.1784822 ]]), array([[ 2.81243949e-04, -2.81243949e-04],
       [ 1.30260852e-04, -1.30260852e-04],
       [-7.34786885e-05,  7.34786885e-05],
       [-1.04105784e-04,  1.04105784e-04]])]
Result of _softmax is:
[[0.50014062 0.49985938]
 [0.50006513 0.49993487]
 [0.49996326 0.50003674]
 [0.49994795 0.50005205]]
Result of np.log(Y + sys.float_info.epsilon) is:
[[-0.69286598 -0.69342846]
 [-0.69301693 -0.69327745]
 [-0.69322066 -0.6930737 ]
 [-0.69325129 -0.69304308]]
_neg_log_likelihood_f returns:
0.34655855279873804
SGD: Epoch 1 Likelihood = Train 0.70712 Validate 0.70712

Out[51]:

NeuralNetworkClassifier(2, [10], 2)

In [52]:

print(nnet)

NeuralNetworkClassifier(2, [10], 2) trained for 1 epochs
  with final likelihoods of 0.7071 train 0.7071 validation.
  Network weights set to best weights from epoch 0 for validation likelihood of 0.7071174143714485.

Now if you turn off debugging, most print statements will be suppressed so you can run for more epochs without tons of output.

In [53]:

nnet.set_debug(False)

No debugging information will be printed.

The use() function returns two numpy arrays. The first one is the class predictions for each sample, containing values from the set of unique values in T passed into the train() function.

The second value are the probabilities of each class for each sample. This should contain a column for each unique value in T.

In [54]:

nnet.use(X)

Out[54]:

(array([[0],
        [0],
        [1],
        [1]]),
 array([[0.50014062, 0.49985938],
        [0.50006513, 0.49993487],
        [0.49996326, 0.50003674],
        [0.49994795, 0.50005205]]))

In [55]:

def percent_correct(Y, T):
    return np.mean(T == Y) * 100

In [56]:

percent_correct(nnet.use(X)[0], T)

Out[56]:

50.0

The XOR problem was used early in the history of neural networks as a problem that cannot be solved with a linear model. Let's try it.

In [57]:

nnet = nn.NeuralNetworkClassifier(2, [], 2)     # [], so no hidden layers, just a linear model
nnet.train(X, T, X, T, 100, method='sgd', learning_rate=0.1)

SGD: Epoch 10 Likelihood = Train 0.70711 Validate 0.70711
SGD: Epoch 20 Likelihood = Train 0.70711 Validate 0.70711
SGD: Epoch 30 Likelihood = Train 0.70711 Validate 0.70711
SGD: Epoch 40 Likelihood = Train 0.70711 Validate 0.70711
SGD: Epoch 50 Likelihood = Train 0.70711 Validate 0.70711
SGD: Epoch 60 Likelihood = Train 0.70711 Validate 0.70711
SGD: Epoch 70 Likelihood = Train 0.70711 Validate 0.70711
SGD: Epoch 80 Likelihood = Train 0.70711 Validate 0.70711
SGD: Epoch 90 Likelihood = Train 0.70711 Validate 0.70711
SGD: Epoch 100 Likelihood = Train 0.70711 Validate 0.70711

Out[57]:

NeuralNetworkClassifier(2, [], 2)

In [58]:

print(nnet)

NeuralNetworkClassifier(2, [], 2) trained for 100 epochs
  with final likelihoods of 0.7071 train 0.7071 validation.
  Network weights set to best weights from epoch 0 for validation likelihood of 0.7071067811865477.

In [59]:

nnet.use(X)

Out[59]:

(array([[0],
        [0],
        [0],
        [0]]),
 array([[0.5, 0.5],
        [0.5, 0.5],
        [0.5, 0.5],
        [0.5, 0.5]]))

In [60]:

percent_correct(nnet.use(X)[0], T)

Out[60]:

50.0

Now try with one hidden layer containing one unit.

In [61]:

nnet = nn.NeuralNetworkClassifier(2, [1], 2)  
nnet.train(X, T, X, T, 100, method='adamw', learning_rate=0.1)

AdamW: Epoch 10 Likelihood = Train 0.75498 Validate 0.75498
AdamW: Epoch 20 Likelihood = Train 0.78091 Validate 0.78091
AdamW: Epoch 30 Likelihood = Train 0.78535 Validate 0.78535
AdamW: Epoch 40 Likelihood = Train 0.78640 Validate 0.78640
AdamW: Epoch 50 Likelihood = Train 0.78677 Validate 0.78677
AdamW: Epoch 60 Likelihood = Train 0.78695 Validate 0.78695
AdamW: Epoch 70 Likelihood = Train 0.78705 Validate 0.78705
AdamW: Epoch 80 Likelihood = Train 0.78712 Validate 0.78712
AdamW: Epoch 90 Likelihood = Train 0.78718 Validate 0.78718
AdamW: Epoch 100 Likelihood = Train 0.78723 Validate 0.78723

Out[61]:

NeuralNetworkClassifier(2, [1], 2)

In [62]:

Y, probs = nnet.use(X)
print(Y)
percent_correct(Y, T)

[[0]
 [1]
 [0]
 [0]]

Out[62]:

75.0

One hidden unit didn't work. Let's try five hidden units.

In [63]:

nnet = nn.NeuralNetworkClassifier(2, [5], 2)  
nnet.train(X, T, X, T, 400, method='adamw')

AdamW: Epoch 40 Likelihood = Train 0.99988 Validate 0.99988
AdamW: Epoch 80 Likelihood = Train 0.99992 Validate 0.99992
AdamW: Epoch 120 Likelihood = Train 0.99993 Validate 0.99993
AdamW: Epoch 160 Likelihood = Train 0.99994 Validate 0.99994
AdamW: Epoch 200 Likelihood = Train 0.99995 Validate 0.99995
AdamW: Epoch 240 Likelihood = Train 0.99995 Validate 0.99995
AdamW: Epoch 280 Likelihood = Train 0.99996 Validate 0.99996
AdamW: Epoch 320 Likelihood = Train 0.99996 Validate 0.99996
AdamW: Epoch 360 Likelihood = Train 0.99996 Validate 0.99996
AdamW: Epoch 400 Likelihood = Train 0.99997 Validate 0.99997

Out[63]:

NeuralNetworkClassifier(2, [5], 2)

In [64]:

print(nnet)

NeuralNetworkClassifier(2, [5], 2) trained for 400 epochs
  with final likelihoods of 1.0000 train 1.0000 validation.
  Network weights set to best weights from epoch 399 for validation likelihood of 0.9999679983795212.

In [65]:

Y, probs = nnet.use(X)
print(Y)
percent_correct(Y, T)

[[0]
 [1]
 [1]
 [0]]

Out[65]:

100.0

A second way to evaluate a classifier is to calculate a confusion matrix. This shows the percent accuracy for each class, and also shows which classes are predicted in error.

Here is a function you can use to show a confusion matrix.

In [66]:

import pandas

def confusion_matrix(Y_classes, T):
    class_names = np.unique(T)
    table = []
    for true_class in class_names:
        row = []
        for Y_class in class_names:
            row.append(100 * np.mean(Y_classes[T == true_class] == Y_class))
        table.append(row)
    conf_matrix = pandas.DataFrame(table, index=class_names, columns=class_names)
    print('Percent Correct')
    return conf_matrix.style.background_gradient(cmap='Blues').format("{:.1f}")

In [67]:

nnet.best_epoch

Out[67]:

In [68]:

nnet.use(X)

Out[68]:

(array([[0],
        [1],
        [1],
        [0]]),
 array([[9.99948915e-01, 5.10846472e-05],
        [2.59792751e-05, 9.99974021e-01],
        [1.10633564e-04, 9.99889366e-01],
        [9.99931691e-01, 6.83094785e-05]]))

In [69]:

confusion_matrix(nnet.use(X)[0], T)

Percent Correct

Out[69]:

	0	1
0	100.0	0.0
1	0.0	100.0

In [70]:

for method in ('sgd', 'adamw', 'scg'):
    nnet = nn.NeuralNetworkClassifier(2, [20, 20], 2)  
    nnet.train(X, T, X, T, 400, method=method, learning_rate=0.1, momentum=0.9, verbose=False)
    pc = percent_correct(nnet.use(X)[0], T)
    print(f'{method} % Correct: {pc:.0f}')

sgd % Correct: 100
adamw % Correct: 100
scg % Correct: 100

Apply `NeuralNetworkClassifier` to Handwritten Digits¶

Apply your NeuralNetworkClassifier to the MNIST digits dataset.

First, make sure your solution works on the following examples. Then complete make_mnist_classifier and use it as instructed below.

In [71]:

import pickle
import gzip

In [72]:

with gzip.open('mnist.pkl.gz', 'rb') as f:
    train_set, valid_set, test_set = pickle.load(f, encoding='latin1')

Xtrain = train_set[0]
Ttrain = train_set[1].reshape(-1, 1)

Xval = valid_set[0]
Tval = valid_set[1].reshape(-1, 1)

Xtest = test_set[0]
Ttest = test_set[1].reshape(-1, 1)

print(Xtrain.shape, Ttrain.shape,  Xval.shape, Tval.shape,  Xtest.shape, Ttest.shape)

(50000, 784) (50000, 1) (10000, 784) (10000, 1) (10000, 784) (10000, 1)

In [73]:

28*28

Out[73]:

In [74]:

def draw_digit(image, label, predicted_label=None):
    plt.imshow(-image.reshape(28, 28), cmap='gray')
    plt.xticks([])
    plt.yticks([])
    plt.axis('off')
    title = str(label)
    color = 'black'
    if predicted_label is not None:
        title += ' as {}'.format(predicted_label)
        if predicted_label != label:
            color = 'red'
    plt.title(title, color=color)

In [75]:

plt.figure(figsize=(7, 7))
for i in range(100):
    plt.subplot(10, 10, i+1)
    draw_digit(Xtrain[i], Ttrain[i, 0])
plt.tight_layout()

In [76]:

nnet = nn.NeuralNetworkClassifier(784, [12], 10)
# nnet = nn.NeuralNetworkClassifier(784, [100, 50, 20, 50], 10)
nnet.train(Xtrain, Ttrain, Xval, Tval, n_epochs=100, batch_size=-1, method='scg')   # , learning_rate=0.1)
print(nnet)

SCG: Epoch 10 Likelihood= Train 0.94325 Validate 0.94576
SCG: Epoch 20 Likelihood= Train 0.96238 Validate 0.96252
SCG: Epoch 30 Likelihood= Train 0.97022 Validate 0.96881
SCG: Epoch 40 Likelihood= Train 0.97452 Validate 0.97166
SCG: Epoch 50 Likelihood= Train 0.97693 Validate 0.97244
SCG: Epoch 60 Likelihood= Train 0.97899 Validate 0.97315
SCG: Epoch 70 Likelihood= Train 0.98061 Validate 0.97320
SCG: Epoch 80 Likelihood= Train 0.98177 Validate 0.97281
SCG: Epoch 90 Likelihood= Train 0.98235 Validate 0.97299
SCG: Epoch 100 Likelihood= Train 0.98321 Validate 0.97263
NeuralNetworkClassifier(784, [12], 10) trained for 100 epochs
  with final likelihoods of 0.9832 train 0.9726 validation.
  Network weights set to best weights from epoch 64 for validation likelihood of 0.973500774228357.

In [93]:

def first_100_tests(nnet, Xtest, Ttest):
    plt.figure(figsize=(7, 7))
    Ytest, _ = nnet.use(Xtest[:100, :])
    for i in range(100):
        plt.subplot(10, 10, i + 1)
        draw_digit(Xtest[i], Ttest[i, 0], Ytest[i, 0])
    plt.tight_layout()

first_100_tests(nnet, Xtest, Ttest)

Requirement 2¶

Experiment with the three different optimization methods, at least three hidden layer structures including [], two learning rates, and two numbers of epochs. Use verbose=False as an argument to train(). For scg, ignore the learning rate loop. Print a single line for each run showing method, number of epochs, learning rate, hidden layer structure, and percent correct for training, validation, and testing data. Here is an example line:

    sgd   10 0.1 []       77.16 79.22 79.05

Use a pandas.DataFrame to show your results with columns labeled correctly.

In [ ]:

# ...

Requirement 3¶

Complete the following function.

In [96]:

def make_mnist_classifier(Xtrain, Ttrain, Xvalidate, Tvalidate, Xtest, Ttest,
                          n_hiddens_each_layer, n_epochs, batch_size=-1,
                          method='adamw', learning_rate=0.1, momentum=0.9):
    
    from IPython.display import display   # to display the confusion matrix in the last step of this function
    
    # Create NeuralNetworkClassifier object
    # ...
    
    # Train it.
    # ...
    
    # Plot the performance trace with legend (f'{method} Train Data', f'{method} Validation Data')
    # Also plot a vertical line at the best epoch, using code like   plt.axvline(nnet.best_epoch, lw=3, alpha=0.5)
    
    #...
    
    # Show the results on the first 100 test images.
    # ...
    
    plt.show()

    # Print the network
    print(nnet)
    
    # Print percent correct on training data, validation data and test data.
    # ...
    
    # Print a confusion matrix using the trained neural network applied to the testing data.
    # display( ... )
    

Here is an example of what your function should produce.

In [100]:

hiddens = [5]
n_epochs = 40
batch_size = -1
method = 'adamw'
learning_rate = 0.1
make_mnist_classifier(Xtrain, Ttrain, Xval, Tval, Xtest, Ttest, hiddens, n_epochs, batch_size, method, learning_rate)

AdamW: Epoch 4 Likelihood = Train 0.86147 Validate 0.86344
AdamW: Epoch 8 Likelihood = Train 0.89506 Validate 0.89917
AdamW: Epoch 12 Likelihood = Train 0.90809 Validate 0.91218
AdamW: Epoch 16 Likelihood = Train 0.91348 Validate 0.91781
AdamW: Epoch 20 Likelihood = Train 0.91704 Validate 0.92071
AdamW: Epoch 24 Likelihood = Train 0.91949 Validate 0.92272
AdamW: Epoch 28 Likelihood = Train 0.92167 Validate 0.92427
AdamW: Epoch 32 Likelihood = Train 0.92380 Validate 0.92533
AdamW: Epoch 36 Likelihood = Train 0.92530 Validate 0.92575
AdamW: Epoch 40 Likelihood = Train 0.92657 Validate 0.92669

NeuralNetworkClassifier(784, [5], 10) trained for 40 epochs
  with final likelihoods of 0.9266 train 0.9267 validation.
  Network weights set to best weights from epoch 39 for validation likelihood of 0.9266944118121945.
Training 73.590 % correct
Validation 73.620 % correct
Testing 72.500 % correct
Percent Correct

	0	1	2	3	4	5	6	7	8	9
0	93.0	0.1	1.0	1.2	0.3	1.6	1.8	0.1	0.8	0.0
1	1.2	92.1	0.6	1.1	0.4	0.0	0.3	2.5	1.9	0.0
2	4.7	1.4	68.0	15.5	1.1	1.6	1.8	0.6	5.4	0.0
3	1.5	0.4	4.8	78.5	0.6	0.8	1.1	2.6	9.8	0.0
4	0.1	0.1	0.4	0.6	91.1	1.6	2.1	2.3	0.6	0.9
5	5.8	0.4	4.0	3.1	10.1	57.5	4.5	1.6	12.6	0.3
6	3.8	0.2	2.5	0.5	0.9	1.1	90.8	0.0	0.1	0.0
7	0.5	1.2	0.3	8.8	1.9	0.8	0.1	83.3	2.4	0.8
8	1.7	0.2	7.6	9.7	1.6	9.7	0.4	3.4	65.7	0.0
9	1.0	0.0	0.3	1.3	7.2	3.7	0.3	83.0	0.8	2.5

Use your function to show results with the three different optimization methods using values for the hidden layer structure, learning rate, and numbers of epochs that work well, such as over 90% correct on test data.

In [107]:

# ...

Requirement 4¶

Discuss your results. In your discussion, include observations about

which method achieves the best result,
which method seems to do best with fewer epochs,
what common classification mistakes are made as shown in your confusion matrices, and
do larger networks (more layers, more units) work better than small networks?

write your comments here

Check-In¶

Tar or zip your jupyter notebook (A4solution.ipynb) and your python script file (neuralnetworksA4.py) into a file named A4.tar or A4.zip. Check in the tar or zip file in Canvas.

Grading¶

Download A4grader.zip, extract A4grader.py before running the following cell.

Remember, you are expected to design and run your own tests in addition to the tests provided in A4grader.py.

In [2]:

%run -i A4grader.py

======================= Code Execution =======================


============================
 import neuralnetworksA4 as nn 
============================
neuralnetworksA4.py defines NeuralNetwork and NeuralNetworkClassifier


================================================================================ 
Testing this for 10 points:

# Checking that NeuralNetworkClassifier is subcless of NeuralNetwork


#  and test result with    issubclass(nn.NeuralNetworkClassifier, nn.NeuralNetwork)

----------------------------------------------------------------------
----  10/10 points. Correct class inheritance.
----------------------------------------------------------------------


================================================================================ 
Testing this for 5 points:

# Checking if the _forward function in NeuralNetworkClassifier is inherited from NeuralNetwork

import inspect
forward_func = [f for f in inspect.classify_class_attrs(nn.NeuralNetworkClassifier) if (f.name == 'forward' or f.name == '_forward')]


#  and test result with    forward_func[0].defining_class == nn.NeuralNetwork

----------------------------------------------------------------------
----  5/5 points. NeuralNetworkClassifier _forward function correctly inherited from NeuralNetwork.
----------------------------------------------------------------------


================================================================================ 
Testing this for 5 points:

# Checking if __str__ is overridden in NeuralNetworkClassifier
import inspect
str_func = [f for f in inspect.classify_class_attrs(nn.NeuralNetworkClassifier) if (f.name == '__str__')]


#  and test result with    str_func[0].defining_class == nn.NeuralNetworkClassifier

----------------------------------------------------------------------
----  5/5 points. NeuralNetworkClassifier __str__ function correctly overridden in NeuralNetworkClassifier.
----------------------------------------------------------------------


================================================================================ 
Testing this for 5 points:

# Checking if _gradient_f in NeuralNetworkClassifier is defined (overridden) in NeuralNetworkClassifier
import inspect
str_func = [f for f in inspect.classify_class_attrs(nn.NeuralNetworkClassifier) if (f.name == '_gradient_f')]


#  and test result with    str_func[0].defining_class == nn.NeuralNetworkClassifier

----------------------------------------------------------------------
----  5/5 points. NeuralNetworkClassifier _gradient_f function correctly defined in NeuralNetworkClassifier.
----------------------------------------------------------------------


================================================================================ 
Testing this for 5 points:

# Checking if _backpropagate in NeuralNetworkClassifier is inherited from NeuralNetwork
import inspect
str_func = [f for f in inspect.classify_class_attrs(nn.NeuralNetworkClassifier) if (f.name == '_backpropagate')]


#  and test result with    str_func[0].defining_class == nn.NeuralNetwork

----------------------------------------------------------------------
----  5/5 points. NeuralNetworkClassifier _backpropagate function correctly inherited from NeuralNetwork.
----------------------------------------------------------------------


================================================================================ 
Testing this for 10 points:

nnet = nn.NeuralNetworkClassifier(2, [], 5)
W_shapes = [W.shape for W in nnet.Ws]
correct = [(3, 5)]


#  and test result with    correct == W_shapes

----------------------------------------------------------------------
----  10/10 points. W_shapes is correct value of [(3, 5)].
----------------------------------------------------------------------


================================================================================ 
Testing this for 10 points:

nnet = nn.NeuralNetworkClassifier(2, [], 5)
G_shapes = [G.shape for G in nnet.Grads]
correct = [(3, 5)]


#  and test result with    correct == G_shapes

----------------------------------------------------------------------
----  10/10 points. G_shapes is correct value of [(3, 5)]
----------------------------------------------------------------------


================================================================================ 
Testing this for 10 points:

np.random.seed(42)
X = np.random.uniform(0, 1, size=(100, 2))
T = (np.abs(X[:, 0:1] - 0.5) > 0.3).astype(int)
nnet = nn.NeuralNetworkClassifier(2, [10, 5], len(np.unique(T)))
nnet.train(X, T, X, T, 20, method='scg')
last_error = nnet.get_performance_trace()[-1]
correct = 0.9297448356260026

SCG: Epoch 2 Likelihood= Train 0.70967 Validate 0.70967
SCG: Epoch 4 Likelihood= Train 0.71795 Validate 0.71795
SCG: Epoch 6 Likelihood= Train 0.73195 Validate 0.73195
SCG: Epoch 8 Likelihood= Train 0.79354 Validate 0.79354
SCG: Epoch 10 Likelihood= Train 0.84909 Validate 0.84909
SCG: Epoch 12 Likelihood= Train 0.88401 Validate 0.88401
SCG: Epoch 14 Likelihood= Train 0.90492 Validate 0.90492
SCG: Epoch 16 Likelihood= Train 0.94989 Validate 0.94989
SCG: Epoch 18 Likelihood= Train 0.96324 Validate 0.96324
SCG: Epoch 20 Likelihood= Train 0.97509 Validate 0.97509

#  and test result with    np.allclose(last_error, correct, atol=0.1)

----------------------------------------------------------------------
----  10/10 points. Correct values in performance_trace.
----------------------------------------------------------------------


================================================================================ 
Testing this for 10 points:

np.random.seed(43)
X = np.random.uniform(0, 1, size=(100, 2))
T = (np.abs(X[:, 0:1] - X[:, 1:2]) < 0.5).astype(int)
T[T == 0] = 10
T[T == 1] = 20
# Unique class labels are now 10 and 20!
nnet = nn.NeuralNetworkClassifier(2, [10, 5], 2)
nnet.train(X, T, X, T, 200, method='scg')
classes, prob = nnet.use(X)
correct_classes =     np.array([[20],
              [20],
              [10],
              [20],
              [10],
              [20],
              [20],
              [10],
              [20],
              [10],
              [20],
              [10],
              [20],
              [10],
              [20],
              [10],
              [20],
              [20],
              [20],
              [20]])

SCG: Epoch 20 Likelihood= Train 0.96174 Validate 0.96174
SCG: Epoch 40 Likelihood= Train 0.98515 Validate 0.98515
SCG: Epoch 60 Likelihood= Train 0.99438 Validate 0.99438
SCG: Epoch 80 Likelihood= Train 0.99730 Validate 0.99730
SCG: Epoch 100 Likelihood= Train 0.99730 Validate 0.99730
SCG: Epoch 120 Likelihood= Train 0.99786 Validate 0.99786
SCG: Epoch 140 Likelihood= Train 0.99851 Validate 0.99851
SCG: Epoch 160 Likelihood= Train 0.99860 Validate 0.99860
SCG: Epoch 180 Likelihood= Train 0.99893 Validate 0.99893
SCG: Epoch 200 Likelihood= Train 0.99936 Validate 0.99936

#  and test result with    np.allclose(classes, correct_classes, atol=0.1)

----------------------------------------------------------------------
----  10/10 points. Correct values in classes.
----------------------------------------------------------------------


================================================================================ 
Testing this for 10 points:

correct_prob =     np.array([[7.87686254e-10, 9.99999999e-01],
              [2.64073742e-10, 1.00000000e+00],
              [1.00000000e+00, 2.17739214e-11],
              [2.37507101e-10, 1.00000000e+00],
              [1.00000000e+00, 5.72602779e-13],
              [2.63951189e-10, 1.00000000e+00],
              [3.07141256e-10, 1.00000000e+00],
              [9.99999995e-01, 5.31601303e-09],
              [5.18960837e-10, 9.99999999e-01],
              [1.00000000e+00, 5.29910868e-15],
              [2.31535786e-10, 1.00000000e+00],
              [1.00000000e+00, 4.76259538e-17],
              [2.31088925e-10, 1.00000000e+00],
              [1.00000000e+00, 3.30767340e-16],
              [3.09810289e-10, 1.00000000e+00],
              [9.99999999e-01, 7.34252931e-10],
              [2.31089312e-10, 1.00000000e+00],
              [2.32724737e-10, 1.00000000e+00],
              [2.69944404e-10, 1.00000000e+00],
              [2.59802216e-10, 1.00000000e+00]])


#  and test result with    np.allclose(probs, correct_probs, atol=0.1)

----------------------------------------------------------------------
----  10/10 points. Correct values in probs.
----------------------------------------------------------------------

======================================================================
A4 Execution Grade is 80 / 80
======================================================================


-- / 5 points. Experiment with the three different optimization methods,
               at least three hidden layer structures including [], two
               learning rates, and two numbers of epochs. Use verbose=False
               as an argument to train(). For scg, ignore the learning rate
               loop. Print a single line for each run showing method, number
               of epochs, learning rate, hidden layer structure, and percent
               correct for training, validation, and testing data.

__ / 5 points. Function make_mnist_classifier defined and used correctly.

__ / 5 points. Discuss your results. In your discussion, include observations about
                which method achieves the best result,
                which method seems to do best with fewer epochs,
                what common classification mistakes are made as shown in your confusion matrices, and
                do larger networks (more layers, more units) work better than small networks?

__ / 5 points. Train a network with values for method, learning rate, number of epochs,
               and a hidden layer structure with no more than 100 units in the first layer
               that you found work well. Extract the weight matrix from the first layer. Now,
               for each unit (column in the weight matrix) ignore the first row of bias weights
               and reshape the remaining weights into a 28 x 28 image for each unit and display
               them. Complete the function to draw the weight matrix for one unit using draw_digit
               as a guide, then use it in a loop to draw the weight matrices for each unit in
               the first layer of your network.
               Discuss what you see. Describe some of the images as patterns that could be
               useful for classifying particular digits.

======================================================================
A4 Results and Discussion Grade is ___ / 20
======================================================================

======================================================================
A4 FINAL GRADE is  _  / 100
======================================================================

Extra Credit (2 points possible): 

Extra Credit for 1 point:

Repeat the above experiments with a different classification data set.  Randonly partition
your data into training, validaton and test parts if not already provided.  Write in 
markdown cells descriptions of the data and your results.
of the data and your results.

Extra Credit for 1 point:

Train a network with values for method, learning rate, number of epochs, and a
hidden layer structure with no more than 100 units in the first layer that you
found work well.  Extract the weight matrix from the first layer.

Now, for each unit (column in the weight matrix) ignore the first row of bias weights and
reshape the remaining weights into a 28 x 28 image for each unit and display them.
Complete the following function to draw the weight matrix for one unit using `draw_digit`
as a guide, then use it in a loop to draw the weight matrices for each unit in the first
layer of your network.

Discuss what you see.  Describe some of the images as patterns that could be useful for classifying particular digits.


A4 EXTRA CREDIT is 0 / 2

Extra Credit (2 points possible)¶

Extra Credit for 1 Point¶

Repeat the above experiments with a different classification data set. Randonly partition your data into training, validaton and test parts if not already provided. Write in markdown cells descriptions of the data and your results.

Extra Credit for 1 Point¶

Train a network with values for method, learning rate, number of epochs, and a hidden layer structure with no more than 100 units in the first layer that you found work well. Extract the weight matrix from the first layer. Now, for each unit (column in the weight matrix) ignore the first row of bias weights and reshape the remaining weights into a 28 x 28 image for each unit and display them. Complete the following function to draw the weight matrix for one unit using draw_digit as a guide, then use it in a loop to draw the weight matrices for each unit in the first layer of your network.

Discuss what you see. Describe some of the images as patterns that could be useful for classifying particular digits.

In [44]:

def draw_weight_matrix(W, unit_index = 0):
    """W is matrix of weights, with shape 784 x n_units in first layer of neural network"""
    ...

In [ ]:

W = nnet.Ws[0]
n_units = W.shape[1]
n_plot_rows = round(np.sqrt(n_units) + 0.5)
n_plot_cols = n_plot_rows

plt.figure(figsize=(10, 10))
for i in range(n_units):
    
    ...
    
plt.tight_layout()

Table of Contents

A4 Neural Network Classifier, or Fun With Handwritten Digits!¶

Requirement 1¶

Apply NeuralNetworkClassifier to Handwritten Digits¶

Requirement 2¶

Requirement 3¶

Requirement 4¶

Check-In¶

Grading¶

Extra Credit (2 points possible)¶

Extra Credit for 1 Point¶

Extra Credit for 1 Point¶

Apply `NeuralNetworkClassifier` to Handwritten Digits¶